FlowFlex: Malleable Scheduling for Flows of MapReduce Jobs

نویسندگان

  • Viswanath Nagarajan
  • Joel L. Wolf
  • Andrey Balmin
  • Kirsten Hildrum
چکیده

We introduce FlowFlex, a highly generic and effective scheduler for flows of MapReduce jobs connected by precedence constraints. Such a flow can result, for example, from a single user-level Pig, Hive or Jaql query. Each flow is associated with an arbitrary function describing the cost incurred in completing the flow at a particular time. The overall objective is to minimize either the total cost (minisum) or the maximum cost (minimax) of the flows. Our contributions are both theoretical and practical. Theoretically, we advance the state of the art in malleable parallel scheduling with precedence constraints. We employ resource augmentation analysis to provide bicriteria approximation algorithms for both minisum and minimax objective functions. As corollaries, we obtain approximation algorithms for total weighted completion time (and thus average completion time and average stretch), and for maximum weighted completion time (and thus makespan and maximum stretch). Practically, the average case performance of the FlowFlex scheduler is excellent, significantly better than other approaches. Specifically, we demonstrate via extensive experiments the overall performance of FlowFlex relative to optimal and also relative to other, standard MapReduce scheduling schemes. All told, FlowFlex dramatically extends the capabilities of the earlier Flex scheduler for singleton MapReduce jobs while simultaneously providing a solid theoretical foundation for both.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elastic Phoenix: Malleable MapReduce for Shared-Memory Systems

We present the design, implementation, and an evaluation of Elastic Phoenix. Based on the original Phoenix from Stanford, Elastic Phoenix is also a MapReduce implementation for shared-memory systems. The key new feature of Elastic Phoenix is that it supports malleable jobs: the ability add and remove worker processes during the execution of a job. With the original Phoenix, the number of proces...

متن کامل

Phurti: Application and Network-aware Flow Scheduling for Mapreduce

Traffic for a typical MapReduce job in a datacenter consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can reduce the average job completion time. However, most of them treat the core network as a black box with ...

متن کامل

On scheduling malleable jobs to minimise the total weighted completion time

This paper is about scheduling parallel jobs, i.e. which can be executed on more than one processor at the same time. Malleable jobs is a special class of parallel jobs. The number of processors a malleable job is executed on may change during the execution. In this work, we consider the NP-hard problem of scheduling malleable jobs to minimize the total weighted completion time or mean weighted...

متن کامل

Real-Time Scheduling of Skewed MapReduce Jobs in Heterogeneous Environments

Supporting real-time jobs on MapReduce systems is particularly challenging due to the heterogeneity of the environment, the load imbalance caused by skewed data blocks, as well as real-time response demands imposed by the applications. In this paper we describe our approach for scheduling real-time, skewed MapReduce jobs in heterogeneous systems. Our approach comprises the following components:...

متن کامل

Scheduling and Energy Efficiency Improvement Techniques for Hadoop Map-reduce: State of Art and Directions for Future Research

MapReduce has become ubiquitous for processing large data volume jobs. As the number and variety of jobs to be executed across heterogeneous clusters are increasing, so is the complexity of scheduling them efficiently to meet required objectives of performance. This report presents a survey of some of the MapReduce scheduling algorithms proposed for such complex scenarios. A taxonomy is provide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013